UNCLASSIFIED _ 

SECURITY  CLASSIFICATION  of  this  page 


aw  i  io.  ojttf 


1b.  RESTRICTIVE  MARKINGS 


A.  PERFORMING  ORGANIZATION  REPORT  NUMBERIS) 

Technical  Report  #  7 


6a  NAME  OF  PERFORMING  ORGANIZATION 

Texas  A£jM  University 


6c.  AOORESS  (City.  Stall  and  ZIP  Coda) 

College  Station,  TX  77853 


b.  OFFICE  SYMBOL 

llf  applicable ) 


8a  NAME  OF  FUNOING/SPONSORING 
ORGANIZATION 

AFOSR 


8c.  AOORESS  (City.  Stall  and  ZIP  Coda) 

Bolling  Air  Force  Base 
Washington,  DC  20332 


nrv^ 


oV-'l  4^' 


11.  TITLE  (Include  Security  Classification) 

A  NOTE  ON  EXTENDED  QUASI -L 


12.  PERSONAL  AUTHORIS) 

Davidian^ 


13b.  TIME  COVERED 

from  8/87 


16.  SUPPLEMENTARY  NOTATION 


*»  W?  .•* 

■:S3l 


30.  OFFICE  SYMBOL 
(If  applicablit 


9.  MONITORING  ORGANIZATION  REPORT  NUMBERIS) 

afosr-tr. 


7a  name  of  monitoring  organization 

Air  Force  Office  of  Scientific  Rese 


7b.  AOORESS  (City.  Stall  and  ZIP  Cadil 

cd 


9.  PROCUREMENT  INSTRUMENT  IOENTIFICAT 

ATBBft  £^49620-85-C-0144  i 


10.  SOURCE  OF  FUNDING  NOS.  '  | 

PROGRAM 

PROJECT 

task 

WORK  UNIT 

ELEMENT  NO. 

NO. 

no. 

NO. 

<e//o2r 

£36  >/ 

1*.  OATE  OF  REPORT  ( Yr..  Mo..  Dayl 


15.  PAGE  COUNT 

16 


17. 


I  FIELD 


COSATI  COOES 


IB.  SUBJECT  TERMS  rContinut  on  rrwm  if  memory  and  idinttfy  by  block  number I 

uwliu r  _ slip,  l. n . _  exponential  family  ,vheteroscedastic  regression  model, .  t'C  j 

_  inference  for  variance  parameters,  pseudo -likelihood 

estimation,  variance  function  estimation 

19.  ABSTRACT  (Continue  on  reverie  if  neettsary  and  identify  by  bloc*  number / 

4fe  study  the  method  of  extended  quasi -likelihood  estimation  and  inference  of  a  variance 
function  recently  proposed  by  Nelder  §  Pregibon  (1987) .  The  estimates  are  inconsistent 
in  general,  and  the  test  levels  can  be  biased,  but  in  many  cases  such  as  the  exponential 
family  the  inconsistency  and  bias  will  not  be  a  major  concern.  Extended  quasi- likelihood 
is  compared  with  Carroll  §  Ruppert's  (1982)  pseudo -likelihood  method,  which  gives  consisted 
estimates  and,  when  slightly  modified,  asymptotically  unbiased  tests.  We  quantify  the 
notion  of  a  problem  in  which  the  amount  of  statistical  information  is  large  in  each  unit, 
showing  in  this  instance  that  the  two  estimates  are  closely  related  and  may  be  asymptotical: 
equivalent  in  many  important  cases.  However,  in  some  cases  outside  the  exponential  family, 
an  asymptotic  bias  can  persist. 


30.  OISTRIBUTION/AVAILABILITY  of  abstract 
UNCLASSIFIEO/UNLIMITEO  ES  same  as  rpt.  G  otic  users  G 
32a  NAME  OF  RESPONSIBLE  INOIVIOUAL 

Major  Brian  Woodruff 


21.  ABSTRACT  SECURITY  CLASSIFICATION 


O'XGC'Pl'’ 


122b.  TELEPHONE  NUMBER 
(Include  Area  Code I 


[Major  Brian  Woodruff _ |  (202)  767-5026 

DO  FORM  1473,  83-APf^  £  ^  gpgoN  of,  i  j^|  7^  obsolete. 


22c.  OFFICE  SYMBOL 

NM 


SECURITY  CLASSIFICATION  OF  THIS  PAGE 


AFOSR-nt.  8  8-  0  841 


A  WOTE  ON  EXTENDED  QUASI-LIKELIHOOD 

M.  Davidian 

Department  of  Statistics 
North  Carolina  State  University 
Box  8203 

Raleigh,  North  Carolina,  U.S.A. 

27695-8203 

R,J.  Carroll 

Department  of  Statistics 
University  of  North  Carolina  at  Chapel  Hill 
321  Phillips  Hall  039  A 
Chapel  Hill,  North  Carolina,  USA 
27514 

SUMMARY 

We  study  the  method  of  extended  quasi-likelihood  estimation  and 
inference  of  a  variance  function  recently  proposed  by  Nelder  &.  Pregibon 
(1987),  The  estimates  are  inconsistent  in  general,  and  the  test  levels  can 
be  biased,  but  in  many  cases  such  as  the  exponential  family  the 
inconsistency  and  bias  will  not  be  a  major  concern.  Extended 
quasi-likelihood  is  compared  with  Carroll  &  Ruppert’s  (1982) 
pseudo- likelihood  method,  which  gives  consistent  estimates  and,  when 
slightly  modified,  asymptotically  unbiased  tests.  We  quantify  the  notion 
of  a  problem  in  which  the  amount  of  statistical  information  is  large  in 
each  unit,  showing  in  this  instance  that  the  two  estimates  are  closely 
related  and  may  be  asymptotically  equivalent  in  many  important  cases. 
However,  in  some  cases  outside  the  exponential  family,  an  asymptotic  bias 
can  persist. 


Keywords:  EXPONENTIAL  FAMILY;  HETEROSCEDASTIC  REGRESSION  MODEL;  INFERENCE 
FOR  VARIANCE  PARAMETERS;  PSEUDO-LIKELIHOOD  ESTIMATION;  VARIANCE 
FUNCTION  ESTIMATION. 


1 .  INTRODUCTION 


Consider  the  following  mean-variance  model  for  observable  data  y: 

E(yi)  =  ^  =  ^(P)  =  :  varCyj)  =  o^g2(p4. Zj.0).  (1.1) 

Here,  y^  is  the  ith  response  variable  of  N  independent  observations, 
(x. ,z^)  are  associated  vectors  of  covariates,  f  is  the  regression  function, 
p  is  a  p-vector  of  regression  parameters,  a  is  a  scale  parameter,  and  g  is 
the  variance  function  with  variance  parameter  6  (r  x  1).  For  example,  the 
variance  may  be  modeled  as  proportional  to  an  unknown  power  of  the  mean: 

g(fi1,zi,0)  =  pj.  ui  >  0.  (1.2) 

Special  cases  of  (1.1)  are  used  in  applications  such  as  radioimmunoassay, 
econometrics,  and  chemical  kinetics.  Hodel  (1.1)  includes  the  class  of 
generalized  linear  models,  see  McCullagh  &  Nelder  (1983). 

A  usual  aim  is  the  estimation  of  J3,  with  estimation  of  the  variance 
function  parameters  as  an  adjunct.  However,  as  discussed  by  Davidian  & 
Carroll  (1987)  and  Davidian.  Carroll  &  Smith  (unpublished),  estimation  of 
the  variance  function,  in  particular  the  parameter  0,  is  an  important 
problem  both  for  estimation  of  /?  and  in  its  own  right. 

Most  methods  for  estimating  0  are  "regression"  methods  based  on 
generalized  least  squares.  In  these  techniques,  0  and  a  are  estimated  by  a 
weighted  regression  of  some  function  of  the  absolute  residuals  from  a  fit 

A 

on  their  expectations.  For  example,  in  location-scale  problems  squared 

2 

residuals  have  approximate  mean  proportional  to  g  (p^.z^.O)  and  variance 

4 

proportional  to  g  (p^.z^.O).  Thus  an  estimate  of  0  can  be  obtained  by  a 
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2  2  A 

generalized  least  squares  regression  of  squared  residuals  on  a  g  (p^.z^.0) 

^  A  A  ^ 

with  variance  function  g  (p^.Zj.O),  where  =  f(x^,PH)-  A  related  method 
is  the  pseudo- likelihood  approach  of  Carroll  &  Ruppert  (1982).  In  this 

As. 

method,  one  pretends  P  =  p*  and  then  estimates  (cr,0)  by  normal  theory 

/v 

maximum  likelihood,  maximizing  ip ^(P^.B.a) ,  where 

N 

ipL[P.Q,o)  =  -N  log  a  -  2  log  [g{pi(P) ,zi ,0}] 

-  (2o2)-1  2  (y.  -  f(x  ,/J)}2/g2{n  (/?).z  ,0}  .  (1.3) 

i=l  1  11 

This  process  may  be  iterated  with  a  generalized  least  squares  routine  for 
p.  The  number  of  iterations  of  the  entire  procedure  for  estimation  of  P 
may  be  chosen  in  advance  or  the  process  may  be  iterated  until  convegence; 
see  Davidian  &,  Carroll  (1987).  The  pseudo- likelihood  method  is 
asymptotically  equivalent  to  weighted  regression  on  squared  residuals  with 
estimated  weights,  and  full  iteration  of  such  a  regression  yields  the 
pseudo- likelihood  estimate.  Both  methods  can  be  modified  to  account  for 
loss  of  degrees  of  freedom  for  preliminary  estimation  of  P  as  in  Harville 
(1977);  for  a  discussion  and  a  review  of  many  common  methods  for  estimation 
of  0.  see  Davidian  &  Carroll  (1987). 

Pseudo- likelihood  and  weighted  squared  residual  estimation  are  based 
upon  the  method  of  moments.  Nelder  &  Pregibon  (1987)  Instead  attempt  to 
define  a  family  of  distributions  with  mean  and  variance  functions  given  by 
(1.1),  this  class  including  as  special  cases  skewed  distributions  such  as 
the  Poisson  or  gamma.  Their  extended  quasi-likelihood  is 
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£qj^(P.  0.C7) 
where 


The  function  is  sometimes  but  not  always  an  exact  log-likelihood. 

Under  (1.2),  if  0  =  0,  is  the  normal  log- likelihood;  for  0  =  1.5,  that 
of  the  inverse  Gaussian.  For  0  =  .5,  a  =  1,  differs  from  the  Pcisson 
log-likelihood  by  replacing  y^!  by  its  Stirling  approximation;  for  0=1, 
differs  from  the  gamma  log-likelihood  by  a  factor  depending  on  a.  One 


=  (-1/2)  2  [log  {2rrCT2g2(y  ,z  .0)}  +  D{y  ,p  (p).z  ,0}/ct2], 

i=l 

(1-4) 

* 

D(y,p,z,0)  =  -2  — ~  W - dw. 

J  g2(w.z.e) 


motivation  for  (1.4)  is  the  Edgeworth  expansion  of  Barndorff-Nielsen  and 
Cox  (1979)  or  the  related  saddlepoint  approximation  of  Daniels  (1954), 
which  yield  an  expansion  for  the  density  of  the  mean  of  m  random  variables 
from  a  one  parameter  exponential  family  as  in®.  The  leading  term  of  the 
expansion  at  m  =  1  is  the  extended  quasi-likelihood  summand.  See  Efron 
(1986)  for  a  related  formulation.  Note  that  the  form  of  may  be 
unsatisfactory  in  situations  for  which  g(y,z,0)  =  0  for  y  =  0.  In  this  case 
Nelder  &  Pregibon  suggest  replacing  g(y,z,0)  by  g(y+c,z,0)  for  some  c;  we 
use  this  adjustment  where  applicable  in  our  discussion. 


An  additional  reason  for  considering  approximate  likelihoods  for  a 
mean-variance  model  is  that  linear  exponential  families  with  given 
mean-variance  relationship  do  not  always  exist.  For  example,  Bar-Lev  & 
Enis  (1986)  have  shown  that  if  the  distribution  of  y^  is  an  exponential 
family  with  variance  function  (1.2).  it  is  necessary  that  0  €  (-«®,0)  U 
(0,1/2),  so  that  such  a  family  exists  only  when  0  €  (0)  U  [1/2,09),  and  the 
general  form  for  the  density  parameterized  in  terms  of  0  and  a  is  unwieldy. 
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We  have  observed  in  many  examples  that  the  pseudo-likelihood  and 
quasi- likelihood  methods  lead  to  similar  estimates,  although  sometimes 
inferences  for  0  are  substantially  different.  In  Section  2,  we  construct 
an  asymptotic  theory  for  extended  quasi-likelihood  which  allows  an  easy 
illustration  of  the  relationship  between  the  two  methods  and  suggests  a 
simple  motivation  for  the  form  of  extended  quasi-likelihood.  In  general, 
the  extended  quasi-likelihood  estimate  of  0  is  inconsistent,  and  the 
resulting  test  is  biased,  but  in  our  experience,  the  inconsistency  has  not 
been  major  at  exponential  families.  -  The  inconsistency  is  noted 
independently  by  Morton  (1987),  who,  like  Carroll  &  Ruppert  (1982),  uses  an 
estimate  of  0  based  on  squared  residuals.  These  methods  have  the  advantage 
of  being  consistent  and,  when  slightly  modified,  asymptotically  unbiased 
for  testing.  We  study  extended  quasi-likelihood  in  the  case  that  it  is 
likely  to  perform  best,  namely,  (Morton,  1987)  when  the  amount  of 
statistical  information  is  large  in  each  observation.  We  quantify  this 
notion,  and  then  show  that  in  this  instance  the  two  estimators  are  nearly 
asymptotically  equivalent,  although  extended  quasi-likelihood  can  be 
affected  by  an  asymptotic  bias  while  pseudo- likelihood  is  not  when  the 
underlying  distribution  is  asymmetric  and  outside  the  exponential  family. 
In  Section  3  we  discuss  inference  for  0  based  on  the  two  approaches.  From 
the  theory  of  Section  2  we  observe  that  while  inference  based  on  asymptotic 
theory  for  the  two  approaches  yields  similar  results  under  many  conditions, 
such  a  test  based  on  extended  quasi-likelihood  can  be  adversely  affected  by 
possible  asymptotic  bias  of  the  estimator.  The  difference  in  test  behavior 
we  have  observed  may  be  due  to  the  effect  of  asymptotic  bias. 
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2.  SOME  ASYMPTOTIC  RESULTS 

Neither  pseudo- likelihood  nor  extended  quasi-likelihood  are  exact 
likelihood  approaches.  Pseudo- likelihood  is  based  on  the  method  of 
moments,  so  that  the  estimating  equations  are  unbiased,  and  hence 
consistency  and  asymptotic  normality  obtain  under  very  general  conditions 
even  without  the  assumption  of  normality.  Let  i^p^.z^.G)  =  log  g(p^,z^,0), 
Ugfp^.Zj.G)  be  its  column  vector  of  partial  derivatives  with  respect  to  0, 

u0^i,zi*0^  =  ue(fVzi*e)  ”  N  12  and  f(p, z.0)  =  lim  N  12 
ti)g(Pi  ,zi . 0)  Wq(p.  ,z^ , 0) 1 .  Let  subscripts  denote  differentiation  with 
respect  to  the  argument,  e.g.,  g^fp^z^G)  =  3g(p1  ,0)/^ .  Define  the 
errors  gfpj.Zj.G)},  and  assume  the  {e^}  are  independent 
with  skewness  and  kurtosis  x^;  x^  =  0  for  normality.  Let  *r  =  (T/,0t)t 
and  use  subscripts  PL  and  QL  to  denote  pseudo-likelihood  and  extended 
quasi-likelihood,  respectively. 


+>  /o 

RESULT  1  (Davidian  and  Carroll,  1987).  Suppose  that  (p-p)/a  =  0  (N  *) 

*  P 

-1/2  A 

and  Tp^  -  y  =  Op(N  ).  Then  0p^  is  asymptotically  normally  distributed 
with  mean  0.  If  a  -»  0  simultaneously  with  N  -*  «,  then 


N1/2(0pL  -  8)  -  (2f (p.z,0)}-1  N  1/2  2  (ej  -  1)  u^.z^B)  +  op(l). 


(2.1) 


If  the  {e^}  are  identically  distributed  with  kurtosis  x,  the  covariance 
matrix  of  the  asymptotic  distribution  of  0p^  is  given  by 


(2  +  x)  (4N  f(p,z,0)}' 


(2.2) 


•i'  u 


The  assumption  a  -»  0  is  a  useful  simplification  technically  and  is 

relevant  in  applications  where  a  is  "small"  relative  to  the  means  as  in 

assay  problems,  see  Davidian,  Carroll  &  Smith  (unpublished).  In  the  gamma 

and  lognormal  distributions,  a  is  the  coefficient  of  variation,  which  is 

often  fairly  small.  Alternatively,  think  of  y^  as  the  mean  of  m 

2  -1/2 

observations  with  mean  p^  and  variance  g  (p^,z^,0).  equate  a  and  m  ,  and 
let  m  -*  “. 

The  assumption  a  -»  0  yields  a  motivation  for  (1.4).  Since  the  goal  of 
extended  quasi-likelihood  is  to  describe  a  class  of  distributions  "nearly" 
containing  exponential  families,  consider  a  density  h  such  that 

log  h(y,a. 0,ct)  =  (ya  -  b(a))/or2  +  c(y,0  a)  (2.3) 

for  some  b.  c,  and  a  =  a(p,z,8).  To  satisfy  (1.1)  we  require  db(a)/da  =  p 

and  d2b(a)/da2  =  g2(p,z,0).  Implying  that  p  =  (db(a)/dp)  dp /da  and 
2 

g  (p,z,0)  =  dp/da.  This  yields,  writing  b  now  as  a  function  of  p, 

a  =  /  (1/g  (u,z,0)}  du;  b(p,z)  =  /  (u/g^(u,z, 0)}  du. 

—oo  -a> 


Plugging  into  (2.3)  gives  after  simplification 


_2,-l 


log  h(y.a,6,a)  =  -  (2a  )  D(y,p,z,0)  +  d(y,0,a) 


(2.4) 


for  some  function  d.  For  h  to  be  a  density  we  must  choose  d  so  that  h 
integrates  to  one;  by  approximating  the  first  term  on  the  right  side  of 
(2.4)  when  a  is  small  we  may  approximate  d.  Since  when  a  is  small  we  have 


-2 


~  (y-u)2/{a2g2(y.z.0)),  d  = 


’-g{27rc72g2(y,z,0). 


D(y,p,z,0) 


-(1/2) 


Inserting  this  in  (2.4)  yields  the  summand  of  (1.4). 

A 

The  fact  that  (1.4)  is  an  approximate  log-1  ikel ihood  implies  that  9^ 
need  not  be  consistent.  With  the  suggested  adjustment  c  =  1/6  as  in  Nelder 
&  Pregibon  (1987).  if  the  (y^)  are  distributed  as  Poisson  with  means  (jj.^ } 
taking  on  values  1  and  4  in  equal  proportions  and  we  assume  0  <  1.0,  the 
theory  of  M-estimation  as  in  Huber  (1981  p.  130-132)  implies  that  0 
converges  to  0.45.  If  the  {p^}  take  on  larger  values,  such  as  30  ,  40  and 

A 

50  in  equal  proportions,  however,  0^  converges  to  0.50.  Many  examples  in 
regression  we  have  seen  suggest  that  extended  quasi-likelihood  and 
pseudo-1 ikel ihood  are  typically  equivalent  for  power  of  mean  models  (1.2). 
Since  the  estimating  equation  for  the  extended  quasi-likelihood 

A  A 

estimate  V  can  be  biased,  standard  asymptotic  theory  for  0^.  while 
possible  to  construct,  is  not  fully  informative.  As  an  approximation  we 
use  the  small  <j  assumption  to  construct  an  asymptotic  theory.  We  also 
describe  an  approach  suggested  by  the  Poisson  case  for  "large"  (pj). 

RESULT  2.  Suppose  that  N1/2('rQL  -  t)  =  0p(l)  and  N1/2  (/J^-  p)/o  =  0(1) 

1/2  ^ 
if  N  a  -»  X  2  0  as  N  -»  «*>,  a  -*  0.  Then 

Wl/2(V  ”  6>  “  <2  f(M.z.0)}-1  N_1/2  2  (e2  -  1)  u^.zj.0) 

1/2  -1 

+  (N  a)  (6  f(p, z,0)}  +  op(l).  where  (2.5) 

-1  N 

S  =  N  ^  CjteOvve)  V°vzi,0)  ■  2  s^Ow0)  weGV2i*0)}* 

A  sketch  of  the  proof  is  contained  In  the  Appendix.  The  implication  of 

A  A 

(2.5)  is  that  while  0pL  and  0^  behave  similarly,  they  differ  in  an 
asymptotic  fashion  through  the  second  term  on  the  right  hand  side  of  (2.5) 


in  way  that  might  affect  asymptotic  inference.  When  0q^  is 

-1/2  -1 

asymptotically  normal  it  will  have  mean  0  +  {6N  f(p,z,0)}  (XC) ,  where 

A  A 

Cjj  -»  C.  From  (2.2)  and  (2.5),  then,  0q^  and  will  be  asymptotically 
equivalent  only  if  X  =  0  or  -*  0;  the  latter  will  occur  for  symmetrically 
distributed  data.  If  the  (e^)  are  identically  distributed  with  kurtosis  ic, 

A  A 

for  example,  then  0q^  and  0p^  will  have  asymptotic  covariance  (2.2). 

In  the  case  of  (1.2),  t>_(p.  ,z.,0)  =  log  p.  so  that 

w  l  i  l 

-1  N  0-1  _  N 

S  =  N  l2^  rt  IX*  1{1  -  20  (log  Pj  -  1N)},  1N  =  2^  log  Mj. 

For  the  normal  distribution,  =  0;  for  the  gamma,  lognormal,  and 

2 

inverse  Gaussian  distributions  =  0(a)  and  =  0(a  ).  so  that  the 

asymptotic  bias  is  0  and  the  two  estimators  are  asymptotically  equivalent 

with  covariance  the  same  as  if  the  data  were  normally  distributed  with  mean 

2  20 

Pj  and  variance  a  Pj  .  From  Bar-Lev  &.  Enis  (1986),  =  0(a)  for 

distributions  which  are  exponential  families  with  0  €  (0)  U  [1/2,“).  If 

the  (Yj)  are  not  from  an  exponential  family,  the  asymptotic  bias  need  not 

be  zero.  For  example,  consider  a  shifted  gamma  model  y^  =  p^  + 

ag(Pj ,Zj ,0)6j ,  where  has  a  gamma  (“j-W^)  distribution  with  E(w.)  = 

otj/Wj.  and  =  (wA  -  ( a j/'Pj)}  (ajA>j)  .  so  that  E(e^)  =  1.  In  this 
-1/2 

case  Cj  =  2  ,  so  that  if  the  {a^}  do  not  depend  on  a,  the  asymptotic 

bias  will  not  vanish.  At  exponential  family  models  in  cases  not  covered  by 
the  asymptotic  theory  here,  one  might  expect  pseudo- likelihood  to  be  more 
variable  than  extended  quasi-likelihood,  since  the  latter  is  based  on 
approximate  exponential  family  likelihoods. 


An  asymptotic  theory  for  which  the  means  are  "large"  in  which  a 


remains  fixed  yields  a  similar  result  under  (1.2)  if  9  <  1.  Let  ^  be  a 
sequence  to  be  chosen  shortly.  Define  p^  =  p^/pg  pj  and  y^  =  y j/pg  pj  so 
that  =  (y*  -  p*)/(6  p*®),  where  6  =  a  If  as  N  -*  “,  min^  p^  -»  00 

and  Pq  ^  -»  «  in  such  a  way  that  the  {p^}  and  the  (y^)  are  well-behaved, 
then  if  0<  1,  5  -»  0  as  N  -»  00  so  that  the  calculations  here  parallel  those 
for  the  case  of  small  a.  By  analogy,  the  small  o  part  of  Result  1  holds. 

Replacing  a  by  6  in  (2.5),  in  the  Poisson  case  for  which  0  =  .5  and  o  =  1 , 

-1/2  -1  ~ 
f .  =  Pj  and  Kj  =  p.  ,  so  that  -»  0  and  the  limiting  covariance  of  0q^ 

is  as  if  Kj  =  0.  Thus,  in  the  case  of  "large”  means  and  data  distributed 

as  Poisson,  extended  quasi-likelihood  and  pseudo-likelihood  will  behave 

simi larly. 

The  theory  presented  here  is  applicable  when  the  small  a  or  large  mean 
assumption  is  valid,  which  is  the  case  in  many  important  situations,  and 
does  not  address  problems  of  other  types. 


3.  INFERENCE  FOR  9 

The  asymptotic  distribution  theory  of  Section  2  can  be  used  to 
construct  tests  of  H^:  0  =  0q.  Throughout,  define  A(p,z,0,x)  =  lim  N  *2 

(2+»ci)  ^(pj.Zj.O)  u0(p.,z  ± .  0) 1 .  From  (1.3).  0pL  maximizes  ^*L(P*«0)- 


where 


E 


One  might  reasonably  base  inference  for  0  on  a  test  statistic 


TN  =  -2  ^PL<^eo)'0O>  "  «PiWePI>6PL>^ 


where  /3(0)  denotes  a  generalized  least  squares  estimate  computed  at  0  and 

2  A  A 

compare  to  the  percentiles  of  the  xr  distribution.  Although  P(0p^)  and 

A 

0pL  do  not  necessarily  jointly  maximize  the  pseudo- likelihood,  the  fact 

A  ,/\  A 

that  {/3(0p^)  -  P"}  converges  in  probability  to  0  along  with  a  Taylor  series 
and  Result  1  may  be  used  to  show  that  under  Hq,  has  asymptotically  the 
same  distribution  as  the  random  variable  2  W(O)tf(p.z,0)W(O),  where  W(M) 
has  a  normally  distribution  with  mean  M  and  covariance  matirix 
{Hu.  z.0)}  1A(p,z.0){f(ji.z,0)}“1/4. 

Nelder  &  Pregibon  (1987)  suggest  a  likelihood  ratio  type  test  based  on 
treating  the  extended  quasi- likelihood  as  an  actual  likelihood.  Such  a 
test  is  based  on 


QN  "  2  ^QL^^0O^,0O^  ~  *QL^0Ql),0QL^' 


where 


^(0-0)  =  "N  log  a^{p.B)  -  2  log  g^.z^O). 

2  -1  N  _ 

=  N  1  2  T){yi.»i{l3),zi}. 


In  the  situation  of  Result  2,  has  asymptotically  the  same  distribution 
as  the  random  variable  2W[XC{6f (p.z.RJj’^AOi.z.OjWOC^f (p,z,0)}-1]  under 
Hq,  where  -*  C. 

o 

The  asymptotic  distributions  of  T^  and  ^  need  not  be  xr  in  general. 

Appropriately  scaled  versions  of  these  statistics,  say  aT^  and  aQ^  for  some 

o 

constant  a.  will  be  possibly  noncentral  x  if  and  only  if  A(p,z.0)  = 


2f (p,z,0)/a,  see,  for  example,  Muirhead  (1982,  p.  31,  Theorem  1.4.5). 

2 

Thus,  comparison  to  the  percentiles  of  the  xr  distribution  may  be 

misleading.  In  many  important  special  cases  of  practical  application. 

however,  the  distributions  are  readily  seen  to  be  chi-square.  If  the 

distributions  of  the  (e^)  are  normal,  so  that  x^=  0,  then  T^  and  are 

2 

both  asymptotically  xr-  If  the  {£j}  are  identically  distributed  with 

kurtosis  x,  since  then  A(p,z,0)  =  (2  +  x)  f(ji,z,0),  it  follows  that  under 

Hq,  (2/(2+x)}  Tn  is  asymptotically  distributed  as  xr*  so  that  a  test  based 

on  this  statistic  with  x  appropriately  estimated  is  an  asymptotic  a-level 

test.  McCullagh  &.  Pregibon  (1987)  consider  estimators  for  the  cumulants 

for  linear  regression  models.  In  the  situation  of  Result  2,  under  Hq. 

2 

(2/(2+x)}  is  asymptotically  distributed  as  noncentral  xr  with  noncentrality 
2  t  -1  -1 

parameter  A  =  A  C  f(p.0)  C  (9(2+x)}  .  As  long  as  A  =  0,  comparing  this 

2 

statistic  to  the  percentiles  of  the  Xr  distribution  is  an  asymptotic 
a-level  test  which  is  asymptotically  equivalent  to  the  test  based  on  Tjj. 

For  general,  not  necessarily  identically  distributed  {e^},  the 
asymptotic  distribution  may  not  necessarily  be  chi-square.  However,  if  r  = 
1  so  that  0,  f(fi,z,0),  and  A(fi,z,0)  are  scalar,  as  in  the  important  cases 
of  extra  variation  in  Poisson  or  binomial  models,  under  Hq. 
2f (p,z.0)/A(ji,z,0)T^  is  asymptotically  distributed  as  x^.  while 

9 

2f (fx,z,0)/A(fi,z,0)Qjij  is  asymptotically  distributed  as  noncentral  Xj  with 
noncentrality  parameter  A  =  \2C?/{9A(n,z,Q)} .  In  practice,  one  might 
estimate  this  factor  by  computing  appropriate  estimates  for  f(p,z,0)  and 
A(ji,z,0).  For  example,  if  nearly  Poisson  data  are  suspected,  one  might 
estimate  x^  by  the  final  estimate  for 

Nelder  &  Pregibon  suggest  comparing  directly  to  the  percentiles  of 

2  2 
the  x  distribution.  Comparing  either  T„  or  Q„  to  the  x  distribution  may 
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be  erroneous  in  general,  but  in  important  practical  situations  such  as 

2 

those  above  in  which  the  distributions  are  \  ,  their  suggestion  does  not 

r 

account  for  the  additional  multiplicative  factor  depending  on  kurtosis 
which  appears  unless  the  data  are  normal.  In  the  saddlepoint  approximation 
approach,  m  -*  «  implies  ~  0,  thus  they  observe  that  if  the  underlying 
distribution  of  the  data  is  known  to  be  from  an  exponential  family,  then 
such  a  test  is  asymptotically  valid.  In  our  asymptotics,  for  the  cases  of 
the  normal,  gamma,  and  inverse  Gaussian  examples  cited  in  Section  2  we  see 
this  to  be  the  case.  For  the  Poisson  case,  one  may  consider  the  analogous 
"large  mean"  asymptotic  approach  at  the  end  of  Section  2  to  conclude  the 
same.  We  further  obtain  the  correct  form  and  properties  for  a  test  of  this 
type  when  only  the  mean-variance  relationship  is  specified.  For  other 
approaches  to  variance  function  estimation  which  avoid  problems  of 
kurtosis,  see  Davidian  &  Carroll  (1987)  and  Giltinan,  Carroll,  &.  Ruppert 
(1986). 

For  a  model  such  as  (1.1)  for  which  only  the  mean  and  variance  are 
specified,  interest  in  0  may  be  in  the  context  of  trying  to  understand  the 
structure  of  the  variances,  not  the  form  of  the  underlying  distributions. 
When  appropriate,  a  chi-square  test  based  on  will  approach  its  nominal 
level  if  A  =  0.  When  the  underlying  distributions  of  the  data  are  such 
that  0q^  is  biased  asymptotically  so  that  A  ^  0,  the  validity  of  a  \  test 
based  on  mey  be  seriously  affected. 
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APPENDIX  :  SKETCH  OF  PROOF  OF  RESULT  2 

AAA 

For  convenience,  let  T]  =  log  a  and  let  (P.0.T))  be  the  joint  extended 
quasi- likelihood  estimators  for  (P.O.-q).  Let  t(1./J.0)  =  {1  .  v*(ni  .Zj  .Q)}* 
p(i.p.e)  =  fp(xi,p)/g{ji.(p).zl.e}.  ^  =  -4N_12  T(i.p.e)  Tfi.p.O)*.  and  = 


AAA 


N  2  p(i,p,6  p(i ,P,Q)  .  From  (1.4),  ( p.O.rj )  solves 


0  = 


AAA 


QiN(P.°.v) 

Q2,n(P.0.d) 


where 


(A.l) 


N 


Qj  N(P.0.D)  =  N  1/2  2  e_2TJ  (Yj  -  Mi)  Pii.p.eygC^^.e); 


,-1/2 


N 


Q2>n(P.0.t))  =  N  2  {  e"217  D(Yi.ni,zi,6)  -  1); 


QsniP.o.v) 


N 


-1/2  1  -2n 

=  N  2  2  e  aoCYj.iij.Zj.ovae  -{  aiog  gfY^Zj.oj/a©}]. 

The  following  result  is  shown  by  assuming  appropriate  smoothness  conditions 
for  g  so  that  D  may  be  differentiated. 


LEMMA  A.  Under  regularity  conditions. 


N  1  2  D(Y  ii  z  0)  =  a2  N-1  2  2  +  a3  N_1  2  s.  .  e2  +  0  (a4); 

1=1  111  i=1  1  1=1  1.1  1  PV  ' 
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N 


n  2  aD(Yi.|ii,z  e)/ae  =  -2  {  a2  n  1  2  efu  (i.p.G) 

1=1  1=1  1  0 

^  1  N  o 

+  a  N  1  2  S  e3  }  *  0  (a4); 

1=1  1  P 


n  1  2  ^(Y  n  z  ej/aese1  =  2  a2  n-1  2  e2  [3  »fl(i.p.e)  i>*(i,p, 
1=1  1=1  1  0  0 

-{g00(Pi.Zi.G)  /  g^.Zj.fl}}  ]  +  0p(a3). 


0) 


where  g^O^.O)  =  ^{gf^.Zj.G  ))/dBdBt.  Sj  ^  =  -2  g^  ^  ,0)/3,  and 
s2. 1  =  g0n^l'zi*0)/3  -  uG^i,zi,e)  g,xOw0)- 

A  Taylor  series  In  (A. 1 }  using  consistency,  a  Taylor  series  in  a  about 
0  using  Lemma  A  and  laws  of  large  numbers  yield  after  simplification 


s 

0 

n1/2 

(P  -  P)/a 

A 

1 

T7  -  TJ 

0 

2  ®N 

A 

G  -  0 

(A- 2) 


-1/2 

N  2  e.  p(l.p.G) 

0 

i=l 

-1/2  ^  9 

N  2  (fcf  -  1)  T(i.p.G) 

i=!  1 

♦  (N1/2o) 

.-l"  3 

N  I  e  s 

i=l  1  1 

+  V1)- 


where  s^  =  (sj  j»  Sg  j).  Equation  (A. 2)  implies  that,  as  N  -»  »,  a  -»  0, 


'k 

',!»! 

,*<&] 


'(3 

M? 
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xaa 


ita 
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-1/2  2 
2  N  1/z  2  (ef 

1=1  1 


-  1)  r(i,p,8) 


+  2  (N1/2o)  N"1  2  e2  s  +  o  (1). 

1=1  1  1  p 


Algebra  and  simple  probability  limit  calculations  yield  the  result. 

A 

Equation  (A. 2)  also  shows  that  in  these  asymptotics  p^  is  equivalent  to  a 
generalized  least  squares  estimator  for  p.  □ 
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